Image processing apparatus, image processing system, image processing method, and storage medium

ABSTRACT

An image processing system includes an image obtaining unit that obtains images based on capturing from plural directions by plural cameras, an information obtaining unit that obtains viewpoint information indicating a virtual viewpoint, and a generation unit configured to generate virtual viewpoint images on a basis of the obtained images and viewpoint information. The generation unit generates a first virtual viewpoint image outputted to a display apparatus that displays an image for a user to specify a virtual viewpoint and a second virtual viewpoint image outputted to an output destination different from the display apparatus by using at least one of data generated in a process for generating the first virtual viewpoint image by image processing using the plural images obtained by the image obtaining unit and the first virtual viewpoint image, the second virtual viewpoint image having a higher image quality than that of the first virtual viewpoint image.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a Continuation of U.S. patent application Ser. No.16/396,281, filed on Apr. 26, 2019, which is a Continuation ofInternational Patent Application No. PCT/JP2017/037978, filed Oct. 20,2017, which claims the benefit of Japanese Patent Application No.2016-211905, filed Oct. 28, 2016, each of which is hereby incorporatedby reference herein in their entirety.

TECHNICAL FIELD

The present invention relates to a technology for generating a virtualviewpoint image.

BACKGROUND ART

In recent days, a technology for capturing a subject from multipleviewpoints by installing a plurality of cameras in different positionsand generating a virtual viewpoint image or a three-dimensional model byusing a plurality of viewpoint images obtained by the capturing hasattracted attention. According to the technology for generating thevirtual viewpoint image from the plurality of viewpoint images asdescribed above, for example, since a highlight scene in soccer orbasketball can be viewed from various angles, it is possible to providehigh realistic sensation to a user as compared with a normal image.

According to PTL 1, it is described that an image quality of the virtualviewpoint image is improved by decreasing units of rendering in aboundary area of an object in the image in a case where a virtualviewpoint image is to be generated by combining the images captured fromthe plurality of viewpoints with one another.

CITATION LIST Patent Literature

PTL 1 Japanese Patent Laid-Open No. 2013-223008

However, according to the related-art technology, it is conceivable thata virtual viewpoint image in accordance with a plurality of differentrequirements with regard to an image quality cannot be generated in somecases. For example, in a case where only the virtual viewpoint imagehaving the high image quality is to be generated, it is conceivable thata processing time related to the generation is lengthened, and there isa fear that it becomes difficult to respond to a desire of a user whowould like to observe the virtual viewpoint image in real time eventhough the image quality is low. On the other hand, in a case where onlythe virtual viewpoint image having the low image quality is to begenerated, there is a fear that it becomes difficult to respond to adesire of a user who prioritizes the high image quality of the virtualviewpoint image over real-time property.

The present invention has been made in view of the above-describedproblem and is aimed at generating a virtual viewpoint image inaccordance with a plurality of different requirements with regard to animage quality.

SUMMARY OF INVENTION

To solve the above-described problem, an image processing apparatusaccording to the present invention includes, for example, the followingconfiguration. That is, the image processing apparatus includes an imageobtaining unit configured to obtain images based on capturing from aplurality of directions by a plurality of cameras, an informationobtaining unit configured to obtain viewpoint information indicating avirtual viewpoint, and a generation unit configured to generate virtualviewpoint images on a basis of the images obtained by the imageobtaining unit and the viewpoint information obtained by the informationobtaining unit, in which the generation unit is configured to generate afirst virtual viewpoint image to be outputted to a display apparatusthat displays an image for a user to specify a virtual viewpoint andalso configured to generate a second virtual viewpoint image to beoutputted to an output destination different from the display apparatusby using at least one of data generated in a process for generating thefirst virtual viewpoint image by image processing using the plurality ofimages obtained by the image obtaining unit and the first virtualviewpoint image, the second virtual viewpoint image having a higherimage quality than that of the first virtual viewpoint image.

Further features of the present invention will become apparent from thefollowing description of exemplary embodiments with reference to theattached drawings.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is an explanatory diagram for describing a configuration of animage processing system 10.

FIG. 2 is an explanatory diagram for describing a hardware configurationof an image processing apparatus 1.

FIG. 3 is a flow chart for describing one mode of an operation of theimage processing apparatus 1.

FIG. 4 is an explanatory diagram for describing a configuration of adisplay screen by a display apparatus 3.

FIG. 5 is a flow chart for describing one mode of the operation of theimage processing apparatus 1.

FIG. 6 is a flow chart for describing one mode of the operation of theimage processing apparatus 1.

DESCRIPTION OF EMBODIMENTS System Configuration

Hereinafter, embodiments of the present invention will be described withreference to the drawings. First, a configuration of an image processingsystem 10 that generates and outputs a virtual viewpoint image will bedescribed by using FIG. 1. The image processing system 10 according tothe present embodiment includes an image processing apparatus 1, acamera group 2, a display apparatus 3, and a display apparatus 4.

It should be noted that the virtual viewpoint image according to thepresent embodiment is an image obtained in a case where a subject iscaptured from a virtual viewpoint. In other words, the virtual viewpointimage is an image representing an appearance at a specified viewpoint.The virtual viewpoint (imaginary viewpoint) may be specified by a useror may also be automatically specified on the basis of a result of animage analysis or the like. That is, the virtual viewpoint imageincludes an arbitrary viewpoint image (free viewpoint image)corresponding to a viewpoint arbitrarily specified by the user. Inaddition, an image corresponding to a viewpoint specified by the userfrom among a plurality of candidates and an image corresponding to aviewpoint automatically specified by an apparatus are also included inthe virtual viewpoint image. It should be noted that, according to thepresent embodiment, a case where the virtual viewpoint image is a movingimage will be mainly described, but the virtual viewpoint image may alsobe a still image.

The camera group 2 includes a plurality of cameras, and each of thecameras captures a subject from respectively different directions.According to the present embodiment, each of the plurality of camerasincluded in the camera group 2 is connected to the image processingapparatus 1 and transmits a captured image, a parameter of each camera,and the like to the image processing apparatus 1. It should be notedhowever that the configuration is not limited to this, and the pluralityof cameras included in the camera group 2 may be mutually communicable,and any one of the cameras included in the camera group 2 may transmitthe captured images by the plurality of cameras, the parameters of theplurality of cameras, and the like to the image processing apparatus 1.In addition, instead of the captured images, any one of the camerasincluded in camera group 2 may transmit an image based on the capturingby the camera group 2 such as an image generated on the basis of adifference between the captured images by the plurality of cameras.

The display apparatus 3 accepts the specification of the virtualviewpoint for generating the virtual viewpoint image and transmits theinformation in accordance with the specification to the image processingapparatus 1. For example, the display apparatus 3 includes an input unitsuch as a joystick, a jog dial, a touch panel, a keyboard, or a mouse,and the user (operator) who specifies the virtual viewpoint operates theinput unit to specify the virtual viewpoint. The user according to thepresent embodiment is an operator who operates an input unit of thedisplay apparatus 3 to specify the virtual viewpoint or a viewer whoobserves the virtual viewpoint image displayed by the display apparatus4, and the user is simply mentioned in a case where the operator and theviewer are not particularly distinguished from each other. According tothe present embodiment, the case where the viewer is different from theoperator will be mainly described, but the configuration is not limitedto this, and the viewer and the operator may be the same user. It shouldbe noted that, according to the present embodiment, the information inaccordance with the specification of the virtual viewpoint which istransmitted from the display apparatus 3 to the image processingapparatus 1 is the virtual viewpoint information indicating the positionand the orientation of the virtual viewpoint. It should be noted howeverthat the configuration is not limited to this, and the information inaccordance with the specification of the virtual view point may beinformation indicating the contents determined in accordance with thevirtual view point such as the shape or the orientation of the subjectin the virtual viewpoint image, and the image processing apparatus 1 maygenerate the virtual viewpoint image on the basis of the above-describedinformation in accordance with the specification of the virtualviewpoint.

Furthermore, the display apparatus 3 displays the virtual viewpointimage generated and output by the image processing apparatus 1 on thebasis of the images based on the capturing by the camera group 2 and thespecification of the virtual viewpoint accepted by the display apparatus3. According to this, the operator can perform the specification of thevirtual viewpoint while observing the virtual viewpoint image displayedon the display apparatus 3. It should be noted that, according to thepresent embodiment, the display apparatus 3 that displays the virtualviewpoint image is configured to accept the specification of the virtualviewpoint, but the configuration is not limited to this. For example,the apparatus that accepts the specification of the virtual viewpointand the display apparatus that displays the virtual viewpoint image forthe operator to specify the virtual viewpoint may be separateapparatuses.

The display apparatus 3 also performs a generation instruction forstarting the generation of the virtual viewpoint image with respect tothe image processing apparatus 1 on the basis of the operation by theoperator. It should be noted that the generation instruction is notlimited to this and may be an instruction for the image processingapparatus 1 to reserve the generation of the virtual viewpoint imagesuch that the generation of the virtual viewpoint image is started at apredetermined time, for example. In addition, the generation instructionmay be an instruction for a reservation such that the generation of thevirtual viewpoint image is started in a case where a predetermined eventoccurs, for example. It should be noted that the apparatus that performsthe generation instruction of the virtual viewpoint image with respectto the image processing apparatus 1 may be an apparatus different fromthe display apparatus 3, and the user may directly input the generationinstruction with respect to the image processing apparatus 1.

The display apparatus 4 displays the virtual viewpoint image generatedby the image processing apparatus 1 on the basis of the specification ofthe virtual viewpoint by the operator who has used the display apparatus3 with respect to the user (viewer) different from the operator whospecifies the virtual viewpoint. It should be noted that the imageprocessing system 10 may include a plurality of display apparatuses 4,and the plurality of display apparatuses 4 may display respectivelydifferent virtual viewpoint images. For example, the display apparatus 4that displays the virtual viewpoint image (live image) to be broadcastlive and the display apparatus 4 that displays the virtual viewpointimage (non-live image) to be broadcast after recording may be includedin the image processing system 10.

The image processing apparatus 1 includes a camera information obtainingunit 100, a virtual viewpoint information obtaining unit 110(hereinafter, the viewpoint obtaining unit 110), an image generationunit 120, and an output unit 130. The camera information obtaining unit100 obtains the images based on the capturing by the camera group 2,external parameters and internal parameters of the respective camerasincluded in the camera group 2, and the like from the camera group 2 tobe output to the image generation unit 120. The viewpoint obtaining unit110 obtains the information in accordance with the specification of thevirtual viewpoint by the operator from the display apparatus 3 to beoutput to the image generation unit 120. The viewpoint obtaining unit110 also accepts the generation instruction of the virtual viewpointimage by the display apparatus 3. The image generation unit 120generates the virtual viewpoint image on the basis of the images basedon the capturing which are obtained by the camera information obtainingunit 100, the information in accordance with the specification obtainedby the viewpoint obtaining unit 110, and the generation instructionaccepted by the viewpoint obtaining unit 110 to be output to the outputunit 130. The output unit 130 outputs the virtual viewpoint imagegenerated by the image generation unit 120 to the external apparatussuch as the display apparatus 3 or the display apparatus 4.

It should be noted that, according to the present embodiment, the imageprocessing apparatus 1 generates the plurality of virtual viewpointimages having the different image qualities to be output to the outputdestinations in accordance with the respective virtual viewpoint images.For example, the virtual viewpoint image having the low image quality inwhich processing time related to the generation is short is output tothe display apparatus 4 observed by the viewer who desires the real-time(low-delay) virtual viewpoint image. On the other hand, the virtualviewpoint image having the high image quality in which the processingtime related to the generation is long is output to the displayapparatus 4 observed by the viewer who desires the virtual viewpointimage having the high image quality. It should be noted that the delayaccording to the present embodiment corresponds to a period from whenthe capturing by the camera group 2 is performed until the virtualviewpoint image based on the capturing is displayed. It should be notedhowever that the definition of the delay is not limited to this, and forexample, a time difference between a real-world time and a timecorresponding to the displayed image may be set as the delay.

Subsequently, a hardware configuration of the image processing apparatus1 will be described by using FIG. 2. The image processing apparatus 1includes a CPU 201, a ROM 202, a RAM 203, an auxiliary storage device204, a display unit 205, an operation unit 206, a communication unit207, and a bus 208. The CPU 201 controls the entirety of the imageprocessing apparatus 1 by using the computer programs and data stored inthe ROM 202 or the RAM 203. It should be noted that the image processingapparatus 1 may include a GPU (Graphics Processing Unit), and the GUImay perform at least part of the processing by the CPU 201. The ROM 202stores the programs and parameters that do not require changes. The RAM203 temporarily stores the programs and data supplied from the auxiliarystorage device 204, data supplied from the outside via the communicationunit 207, and the like. The auxiliary storage device 204 is constituted,for example, by a hard disc drive or the like and stores contents datasuch as a still image and a moving image.

The display unit 205 is constituted, for example, by a liquid crystaldisplay or the like and displays a GUI (Graphical User Interface) forthe user to operate the image processing apparatus 1 and the like. Theoperation unit 206 is constituted, for example, by a keyboard, a mouse,or the like and accepts the operations by the user and inputs variousinstructions to the CPU 201. The communication unit 207 performs acommunication with an external apparatus such as the camera group 2, thedisplay apparatus 3, or the display apparatus 4. For example, a LANcable or the like is connected to the communication unit 207 in a casewhere the image processing apparatus 1 is connected to the externalapparatus in a wired manner. It should be noted that, in a case wherethe image processing apparatus 1 includes a function for wirelesslycommunicating with an external apparatus, the communication unit 207 isprovided with an antenna. The bus 208 transmits the information byconnecting the respective units of the image processing apparatus 1 toeach other.

It should be noted that, according to the present embodiment, thedisplay unit 205 and the operation unit 206 exist inside the imageprocessing apparatus 1, but a configuration may also be adopted in whichthe image processing apparatus 1 is not provided with at least one ofthe display unit 205 and the operation unit 206. In addition, at leastone of the display unit 205 and the operation unit 206 may exist outsidethe image processing apparatus 1 as another apparatus, and the CPU 201may operate as a display control unit that controls the display unit 205and an operation control unit that controls the operation unit 206.

Operation Flow

Next, one mode of an operation of the image processing apparatus 1 willbe described by using FIG. 3. The processing illustrated in FIG. 3 isstarted when the viewpoint obtaining unit 110 performs the acceptance ofthe generation instruction of the virtual viewpoint image and isrepeated periodically (for example, every frame in a case where thevirtual viewpoint image is a moving image). It should be noted howeverthat the starting timing of the processing illustrated in FIG. 3 is notlimited to the above-described timing. The processing illustrated inFIG. 3 is realized when the CPU 201 expands the programs stored in theROM 202 into the RAM 203 to be executed. It should be noted that atleast part of the processing illustrated in FIG. 3 may be realized bydedicated-use hardware different from the CPU 201.

In the flow illustrated in FIG. 3, S2010 and S2020 correspond toprocessing for obtaining the information, and S2030 to S2050 correspondto processing for generating and outputting the virtual viewpoint image(specification image) for the operator to specify the virtual viewpoint.In addition, S2070 to S2100 correspond to processing for generating andoutputting the live image. S2110 to S2130 correspond to processing forgenerating and outputting the non-live image. Hereinafter, details ofthe processes in the respective steps will be described.

In S2010, the camera information obtaining unit 100 obtains the capturedimages of the respective cameras based on the capturing by the cameragroup 2 and the external parameters and the internal parameters of therespective cameras. The external parameter is information with regard toa position and an orientation of the camera, and the internal parameteris information with regard to a focal distance and an image center ofthe camera.

In S2020, the viewpoint obtaining unit 110 obtains the virtual viewpointinformation as the information in accordance with the specification ofthe virtual viewpoint by the operator. According to the presentembodiment, the virtual viewpoint information corresponds to an externalparameter and an internal parameter of a virtual camera that capturesthe subject from the virtual viewpoint, and one piece of virtualviewpoint information is needed to generate one frame of the virtualviewpoint image.

In S2030, the image generation unit 120 estimates a three-dimensionalshape of an object corresponding to the subject on the basis of thecaptured images by the camera group 2. The object corresponding to thesubject is, for example, a person, a moving object, or the like thatexists in a capturing range of the camera group 2. The image generationunit 120 calculates differences between the captured images obtainedfrom the camera group 2 and previously obtained background imagescorresponding to the respective cameras to generate silhouette images inwhich a part (foreground area) corresponding to the object in thecaptured image is extracted. The image generation unit 120 then uses thesilhouette images corresponding to the respective cameras and theparameters of the respective cameras to estimate the three-dimensionalshape of the object. For example, a Visual Hull technique is used forthe estimation of the three-dimensional shape. As a result of thisprocessing, a 3D point group (set of points having three-dimensionalcoordinates) that represents the three-dimensional shape of the objectcorresponding to the subject is obtained. It should be noted that themethod of deriving the three-dimensional shape of the object from thecaptured images by the camera group 2 is not limited to this.

In S2040, the image generation unit 120 performs rendering of the 3Dpoint group and a background 3D model on the basis of the obtainedvirtual viewpoint information and generates the virtual viewpoint image.The background 3D model is, for example, a CG model such as a racingground where the camera group 2 is installed and is previously createdto be saved in the image processing system 10. In the virtual viewpointimage generated by the processing thus far, the area corresponding tothe object and the background area are respective displayed inpredetermined colors (for example, a single color). It should be notedthat the processing for performing the rendering of the 3D point groupand the background 3D model is already known in the field of gaming andcinema and a method of promptly performing the processing is known suchas, for example, a method of performing the processing by using the GPU.For this reason, the virtual viewpoint image generated in the processingup to S2040 can be promptly generated in accordance with the capturingby the camera group 2 and the specification of the virtual viewpoint bythe operator.

In S2050, the output unit 130 outputs the virtual viewpoint imagegenerated in S2040 by the image generation unit 120 to the displayapparatus 3 for the operator to specify the virtual viewpoint. Here, ascreen configuration of a display screen 30 of the display apparatus 3will be described by using FIG. 4. The display screen 30 is constitutedby an area 310, an area 320, and an area 330. For example, the virtualviewpoint image generated as the specification image is displayed in thearea 310, the virtual viewpoint image generated as the live image isdisplayed in the area 320, and the virtual viewpoint image generated asthe non-live image is displayed in the area 330. That is, the virtualviewpoint image generated in S2040 and output in S2050 is displayed inthe area 310. The operator then performs the specification of thevirtual viewpoint while observing the screen of the area 310. It shouldbe noted that it is sufficient when the display apparatus 3 displays atleast the specification image and does not necessarily need to displaythe live image and the non-live image.

In S2060, the image generation unit 120 determines whether or not theprocessing for generating the virtual viewpoint image having the higherimage quality than the virtual viewpoint image generated in S2040 isperformed. For example, in a case where only the image having the lowimage quality for specifying the virtual viewpoint is needed, the flowdoes not proceed to S2070, and the processing is ended. On the otherhand, in a case where the image having the higher image quality isneeded, the flow proceeds to S2070, and the processing continues.

In S2070, the image generation unit 120 further increases the accuracyof the shape model of the object (3D point group) which is estimated inS2030 by using a Photo Hull technique, for example. Specifically, byprojecting the respective points of the 3D point group onto the capturedimages of the respective cameras and evaluating color matching rates inthe respective captured images, it is determined whether or not thepoint is a point necessary to represent the subject shape. For example,with regard to a certain point in the 3D point group, a variance of apixel value at a projection destination is higher than a threshold, itis determined that the point is not correct as the point representingthe subject shape, and the point is deleted from the 3D point group.This processing is performed with respect to all of the points in the 3Dpoint group to realize the increase in the accuracy of the shape modelof the object. It should be noted that the method of increasing theaccuracy of the shape model of the object is not limited to this.

In S2080, the image generation unit 120 executes processing for coloringthe 3D point group in which the accuracy is increased in S2070 andprojecting it onto the coordinates of the virtual viewpoint to generatea foreground image corresponding to the foreground area and processingfor generating a background image as viewed from the virtual viewpoint.The image generation unit 120 then overlaps the foreground image ontothe generated background image to generate the virtual viewpoint imageas the live image.

Herein, an example of the method of generating the foreground image(image of the area corresponding to the object) of the virtual viewpointimage will be described. The processing for coloring the 3D point groupis executed to generate the foreground image. The coloring processing isconstituted by visibility determination of the point and calculationprocessing of the color. In the visibility determination, it is possibleto identify the cameras that can perform the capturing with regard tothe respective points from positional relationships between therespective points in the 3D point group and the plurality of camerasincluded in the camera group 2. Next, with regard to the respectivepoints, a point is projected onto the captured image of the camera thatcan capture the point, and a color of a pixel at the projectiondestination is set as the color of the point. In a case where a certainpoint is captured by a plurality of cameras, the point is projected ontothe captured images of the plurality of cameras, and pixel values at theprojection destination are obtained, so that the color of the point isdecided by calculating an average of the pixel values. When therendering of the thus colored 3D point group is performed by arelated-art CG rendering technology, it is possible to generate theforeground image of the virtual viewpoint image.

Next, an example of the method of generating the background image of thevirtual viewpoint image will be described. First, apices of thebackground 3D model (for example, points corresponding to edges of theracing ground) are set. Then, these apices are projected onto coordinatesystems of two cameras (set as a first camera and a second camera) closeto the virtual viewpoint and a coordinate system of the virtualviewpoint. In addition, a first projection matrix between the virtualviewpoint and the first camera and a second projection matrix betweenthe virtual viewpoint and the second camera are calculated by usingcorresponding points of the virtual viewpoint and the first camera andcorresponding points of the virtual viewpoint and the second camera.Then, the captured image of the first camera and the captured image ofthe second camera are projected onto the respective pixels of thebackground image by using the first projection matrix and the secondprojection matrix, and the average of the pixel values at the projectiondestination is calculated, so that the pixel values of the backgroundimage are decided. It should be noted that the pixel values of thebackground image may be decided from the captured images of three ormore cameras by a similar method.

The colored virtual viewpoint image can be generated by overlapping theforeground image on the thus obtained background image of the virtualviewpoint image. That is, the virtual viewpoint image generated in S2080has the higher image quality than the virtual viewpoint image generatedin S2040 with regard to the number of gradations of the colors.Conversely, the number of gradations of the colors included in thevirtual viewpoint image generated in S2040 is lower than the number ofgradations of the colors included in the virtual viewpoint imagegenerated in S2080. It should be noted that the method of adding thecolor information to the virtual viewpoint image is not limited to this.

In S2090, the output unit 130 outputs the virtual viewpoint imagegenerated in S2080 by the image generation unit 120 to the displayapparatus 3 and the display apparatus 4 as the live image. The imageoutput to the display apparatus 3 is displayed in the area 320 and canbe observed by the operator, and the image output to the displayapparatus 4 can be observed by the viewer.

In S2100, the image generation unit 120 determines whether or not theprocessing for generating the virtual viewpoint image having the higherimage quality than the virtual viewpoint image generated in S2080 isperformed. For example, in a case where the virtual viewpoint image isonly provided to be broadcast live with respect to the viewer, the flowdoes not proceed to S2110, and the processing is ended. On the otherhand, in a case where the image having the higher image quality is to bebroadcast towards the viewer after recording, the flow proceeds toS2110, and the processing continues.

In S2110, the image generation unit 120 further increases the accuracyof the shape model of the object generated in S2070. According to thepresent embodiment, the increase in the accuracy is realized by deletingan isolated point of the shape model. In the isolated point removal,first, with regard to a voxel set (3D point group) calculated by PhotoHull, whether or not another voxel exists in the surrounding of therespective voxels is investigated. In a case where the voxel does notexist in the surrounding, it is determined that the voxel is theisolated point, and the voxel is deleted from the voxel set. When theprocessing similar to S2080 is executed by using the shape model fromwhich the isolated point is thus deleted, the virtual viewpoint image isgenerated in which the higher accuracy of the shape of the object isobtained than the virtual viewpoint image generated in S2080.

In S2120, the image generation unit 120 applies smoothing processing toa boundary between the foreground area of the virtual viewpoint imagegenerated in S2110 and the background area and corrects the image suchthat a boundary area is smoothly displayed.

In S2130, the output unit 130 outputs the virtual viewpoint imagegenerated by the image generation unit 120 in S2120 to the displayapparatus 3 and the display apparatus 4 as the non-live image. Thenon-live image output to the display apparatus 3 is displayed in thearea 330.

By the above-described processing, the image processing apparatus 1generates the virtual viewpoint image as the specification image for theoperator to specify the virtual viewpoint and the live imagecorresponding to the virtual viewpoint image having the higher imagequality than the specification image which is to be displayed withrespect to the viewer on the basis of one set of the captured images andthe virtual viewpoint information. Herein, the live image is generatedon the basis of the specification of the virtual viewpoint by theoperator. Specifically, the live image is the virtual viewpoint imagecorresponding to the virtual viewpoint decided in accordance with thespecification operation by the operator with respect to thespecification image. In addition, the image processing apparatus 1 alsogenerates the non-live image corresponding to the virtual viewpointimage having the higher image quality than the live image. The imageprocessing apparatus 1 then outputs the generated live image andnon-live image to the display apparatus 4 such that the live image isdisplayed before the non-live image is displayed. The image processingapparatus 1 also outputs the generated specification image to thedisplay apparatus 3 such that the specification image is displayed onthe display apparatus 3 before the live image is displayed the displayapparatus 4.

According to this, the display apparatus 4 can display the specificationimage having the low image quality, the live image to be broadcast livewhich has the higher image quality than the specification image, and thenon-live image to be broadcast after recording which has the even higherimage quality than the live image. It should be noted that the displayapparatus 4 may also display only one of the live image and the non-liveimage, and in above-described case, the image processing apparatus 1outputs the virtual viewpoint image suitable to the display apparatus 4.In addition, the display apparatus 3 can display the three types of thevirtual viewpoint images including the virtual viewpoint image havingthe low image quality as the specification image, the virtual viewpointimage having the medium image quality as the live image, and the virtualviewpoint image having the high image quality as the non-live image. Itshould be noted that it is also sufficient when the display apparatus 3does not display at least any one of the live image and the non-liveimage.

That is, the image processing apparatus 1 outputs the specificationimage to the display apparatus 3 for the user to specify the virtualviewpoint. The image processing apparatus 1 then outputs at least anyone of the live image and the non-live image that have the higher imagequality than the specification image to the display apparatus 4 fordisplaying the virtual viewpoint image generated on the basis of thespecification of the virtual viewpoint by the user. According to this,it is possible to respond to both the requirements of the operator whodesires to display the virtual viewpoint image with low delay forspecifying the virtual viewpoint and the viewer who desires to observethe virtual viewpoint image having the high image quality.

It should be noted that, in the above-described processing, the virtualviewpoint image is generated on the basis of the images based on thecapturing by the camera group 2 and the information in accordance withthe specification of the virtual viewpoint, and the virtual viewpointimage having the high image quality is generated on the basis of theresult of the processing for the generation. For this reason, theoverall processing amount can be decreased as compared with a case wherethe virtual viewpoint image having the low image quality and the virtualviewpoint image having the high image quality are respectively generatedby independent processes. It should be noted however that the virtualviewpoint image having the low image quality and the virtual viewpointimage having the high image quality may also be generated by independentprocesses. In addition, in a case where the virtual viewpoint image isdisplayed on a display installed in a competition venue or a concertvenue or broadcast live corresponding to a case where the image does notneed to be broadcast after recording, the image processing apparatus 1does not perform the processing for generating the non-live image.According to this, it is possible to reduce the processing amount forgenerating the non-live image having the high image quality.

In addition, the image processing apparatus 1 may generate a replayimage to be displayed after capturing instead of the live image to bebroadcast live or in addition to the live image. For example, the replayimage is displayed on the display in the competition venue duringhalftime or after the end of the match in a case where the target of thecapturing by the camera group 2 is a match such as soccer in thecompetition venue. The replay image has a higher image quality than thespecification image and is also generated at such an image quality thatthe generation can be completed until the end of the match or halftimeto be displayed.

Next, another mode of the operation of the image processing apparatus 1will be described by using FIG. 5. According to the operation modedescribed above by using FIG. 3, the virtual viewpoint image having thehigh image quality is generated by additionally performing processing ofa new type after the virtual viewpoint image having the low imagequality is generated. On the other hand, in the operation mode whichwill be described below by using FIG. 5, the increase in the imagequality of the virtual viewpoint image is realized by increasing thenumber of cameras used for generating the virtual viewpoint image. Inthe following explanation, the descriptions of the part similar to theprocessing in FIG. 3 will be omitted.

The processing illustrated in FIG. 5 is started at a timing when theviewpoint obtaining unit 110 performs the acceptance of the generationinstruction of the virtual viewpoint image. It should be noted howeverthat the starting timing of the processing of FIG. 5 is not limited tothis. In S2010 and S2020, the image processing apparatus 1 obtains thecaptured images by the respective cameras of the camera group 2 and thevirtual viewpoint information by the processing similar to thatdescribed in FIG. 3.

In S4030, the image generation unit 120 sets the number of camerascorresponding to the captured images used for the generation of thevirtual viewpoint image. Herein, the image generation unit 120 sets thenumber of cameras such that the processing in S4050 to S4070 iscompleted in a processing time shorter than or equal to a predeterminedthreshold (for example, a time corresponding to one frame in a casewhere the virtual viewpoint image is a moving image). For example, it issupposed that the processing in S4050 to S4070 is executed by using thecaptured images of 100 cameras in advance, and the processing time is0.5 seconds. In this case, when the processing in S4050 to S4070 isdesired to be completed in 0.016 seconds corresponding to one frame ofthe virtual viewpoint image in which the frame rate is 60 fps (frame persecond), the number of cameras is set as 3.

It should be noted that, in a case where the continuation of the imagegeneration is determined in S4080 after the virtual viewpoint image isoutput by the processing in S4050 to S4070, the flow returns to S4030,and the number of used cameras is set again. Herein, a permissibleprocessing time is lengthened such that the virtual viewpoint imagehaving the higher image quality than the previously output virtualviewpoint image is to be generated, and the number of cameras isincreased according to it. For example, the number of camerascorresponding to the captured images to be used is set as 20 such thatthe processing in S4050 to S4070 is to be completed in a processing timeshorter than or equal to 0.1 seconds.

In S4040, the image generation unit 120 selects the camerascorresponding to the captured images to be used from the camera group 2in accordance with the number of cameras set in S4030 to generate thevirtual viewpoint image. For example, in a case where 3 cameras areselected from among 100 cameras, the camera closest to the virtualviewpoint and the 34th camera and the 67th camera counted from thecamera are selected.

In addition, after the virtual viewpoint image is generated once, in acase where the processing is performed in the second time by increasingthe number of captured images to be used, since the accuracy of theshape model estimated in the processing in the first time is furtherincreased, a camera other than the camera selected in the firstprocessing is selected. Specifically, in a case where 20 cameras areselected from among 100 cameras, the camera closest to the virtualviewpoint is selected first from among the cameras that are not selectedin the first processing, and the camera is selected at intervals ofevery five cameras. At this time, the camera already selected in thefirst processing is skipped, and the next camera is selected. It shouldbe noted that, for example, in a case where the virtual viewpoint imagehaving the highest image quality as the non-live image is generated, allthe cameras included in the camera group 2 are selected, and theprocessing in S4050 to S4070 is executed by using the captured images ofthe respective cameras.

It should be noted that a method of selecting the cameras correspondingto the captured images to be used is not limited to this. For example,the camera close to the virtual viewpoint may be prioritized to beselected. In this case, the accuracy of the shape estimation of the backarea that is not seen from the virtual viewpoint in the shape estimationof the object corresponding to the subject is decreased, but theaccuracy of the shape estimation of the front side area that is seenfrom the virtual viewpoint is improved. That is, the image quality inthe area easily observed by the viewer in the virtual viewpoint imagecan be preferentially improved.

In S4050, the image generation unit 120 executes the object shapeestimation processing by using the captured image by the camera which isselected in S4040. The processing here is, for example, a combination ofthe processing in S2030 in FIG. 3 (Visual Hull) and the processing inS2070 (Photo Hull). The processing of Visual Hull includes processingfor calculating a logical product of visual volumes of the plurality ofcameras corresponding to the plurality of captured images to be used. Inaddition, the processing of Photo Hull includes processing forprojecting the respective points of the shape model onto the pluralityof captured images and calculating consistency of the pixel values. Forthis reason, as the number of cameras corresponding to the capturedimages to be used is lower, the accuracy of the shape estimation isdecreased, and the processing time is shortened.

In S4060, the image generation unit 120 executes the renderingprocessing. The processing herein is similar to the processing in S2080in FIG. 3 and includes the coloring processing for the 3D point groupand the generation processing for the background image. The coloringprocessing for the 3D point group and the generation processing for thebackground image both include the processing for deciding the color bythe calculating using the pixel values of the points corresponding tothe plurality of captured images. For this reason, as the number ofcameras corresponding to the captured images to be used is lower, theaccuracy of the rendering is decreased, and the processing time isshortened.

In S4070, the output unit 130 outputs the virtual viewpoint imagegenerated by the image generation unit 120 in S4060 to the displayapparatus 3 or the display apparatus 4.

In S4080, the image generation unit 120 determines whether or not theprocessing for generating the virtual viewpoint image having the higherimage quality than the virtual viewpoint image generated in S4060 isperformed. For example, in a case where the virtual viewpoint imagegenerated in S4060 is the image for the operator to specify the virtualviewpoint and the live image is to be further generated, the flowreturns to S4030, and the virtual viewpoint image as the live image isgenerated by increasing the number of used cameras. In addition, in acase where the non-live image is further generated after the live imageis generated, and the virtual viewpoint image as the non-live image isgenerated by further increasing the number of cameras. That is, sincethe number of cameras corresponding to the captured images used for thegeneration of the virtual viewpoint image as the live image is higherthan the number of cameras corresponding to the captured images used forthe generation of the virtual viewpoint image as the specificationimage, the live image has the higher image quality than thespecification image. Similarly, since the number of camerascorresponding to the captured images used for the generation of thevirtual viewpoint image as the non-live image is higher than the numberof cameras corresponding to the captured images used for the generationof the virtual viewpoint image as the live image, the non-live image hasthe higher image quality than the live image.

It should be noted that, in S4080, in a case where it is determined thatthe virtual viewpoint image having the higher image quality than thealready generated virtual viewpoint image does not need to be generatedor a case where it is determined that the virtual viewpoint image havingthe higher image quality cannot be generated, the processing is ended.

By the above-described processing, the image processing apparatus 1 cangenerate the plurality of virtual viewpoint images in which the imagequality is improved stepwise at respectively appropriate timings to beoutput. For example, it is possible to generate the specification imagewith little delay by restricting the number of cameras to be used forthe generation of the virtual viewpoint image to such a number that thegeneration processing can be completed in the set processing time. Inaddition, in a case where the live image and the non-live image are tobe generated, it is possible to generate the higher image quality imageswhen the generation processing is performed by increasing the number ofused cameras.

Next, another mode of the operation of the image processing apparatus 1will be described by using FIG. 6. According to the operation modedescribed above by using FIG. 5, when the number of cameras used forgenerating the virtual viewpoint image is increased, the increase in theimage quality of the virtual viewpoint image is realized. On the otherhand, according to the operation mode which will be described below byusing FIG. 6, the increase in the image quality of the virtual viewpointimage is realized by increasing a resolution of the virtual viewpointimage stepwise. In the following explanation, the descriptions of thepart similar to the processing in FIG. 3 or FIG. 5 will be omitted. Itshould be noted that, according to the operation mode which will bedescribed below, the number of pixels of the virtual viewpoint image tobe generated is regularly set as 4K (3840×2160), and the resolution ofthe virtual viewpoint image is controlled depending on whether thecalculation for the pixel value is performed for each large pixel blockor each small pixel block. It should be noted however that theconfiguration is not limited to this, and the resolution may becontrolled by changing the number of pixels of the virtual viewpointimage to be generated.

The processing illustrated in FIG. 6 is started at a timing when theviewpoint obtaining unit 110 performs the acceptance of the generationinstruction of the virtual viewpoint image. It should be noted howeverthat the starting timing of the processing of FIG. 6 is not limited tothis. In S2010 and S2020, the image processing apparatus 1 obtains thecaptured images by the respective cameras of the camera group 2 and thevirtual viewpoint information by the processing similar to thatdescribed in FIG. 3.

In S5030, the image generation unit 120 sets a resolution of the virtualviewpoint image to be generated. Herein, the image generation unit 120sets such a resolution that the processing in S5050 and S4070 iscompleted in the processing time shorter than or equal to apredetermined threshold. For example, it is supposed that the processingin S5050 and S4070 in a case where the virtual viewpoint image havingthe 4K resolution is previously generated is executed and the processingtime is set as 0.5 seconds. In this case, when the processing in S5050and S4070 is desired to be completed in 0.016 seconds corresponding toone frame of the virtual viewpoint image in which the frame rate is 60fps, the resolution needs to be 0.016/0.5=1/31.25 times that of 4K orlower. In view of the above, when the vertical and horizontalresolutions of the virtual viewpoint image are respectively set to be1/8 times the 4K resolution, the number of pixel blocks the pixel valueis to be calculated becomes 1/64, and the processing can be completed inless than 0.016 seconds.

It should be noted that, in a case where it is determined in S4080 thatthe image generation is continued after the virtual viewpoint image isoutput by the processing in S5050 and S4070, the flow returns to S5030,and the resolution is set again. Herein, the permissible processing timeis lengthened such that the virtual viewpoint image having the higherimage quality than the previously output virtual viewpoint image isgenerated, and the resolution is increased in accordance with it. Forexample, when the vertical and horizontal resolutions are respectivelyset to be 1/4 of the 4K resolution, the processing in S5050 and S4070can be completed in the processing time shorter than or equal to 0.1seconds. In S5040, the image generation unit 120 decides the position ofthe pixel where the pixel value is to be calculated in the virtualviewpoint image in accordance with the resolution set in S5030. Forexample, in a case where the resolution of the virtual viewpoint imageis set to be 1/8 of the 4K resolution, the pixel values are respectivelycalculated for every eight pixels vertically and horizontally. Then, thesame pixel value as the pixel (x, y) is set for the pixels existingbetween the pixel (x, y) and the pixel (x+8, y+8) where the pixel valueis calculated.

In addition, after the virtual viewpoint image is generated once, in acase where the processing in the second time is performed by increasingthe resolution, the pixel value is calculated by skipping the pixelwhere the pixel value is calculated in the first time. For example, in acase where the resolution is set to be 1/4 of the 4K resolution, thepixel value of the pixel (x+4, y+4) is calculated, and the same pixelvalue as the pixel (x+4, y+4) is set for the pixels existing between thepixel (x+4, y+4) and the pixel (x+8, y+8). In this manner, when thenumber of pixels where the pixel value is calculated is increased, theresolution of the virtual viewpoint image can be increased up to the 4Kresolution at maximum.

In S5050, the image generation unit 120 performs the coloring processingfor the virtual viewpoint image by calculating the pixel value of thepixel in the position decided in S5040. As a calculation method for thepixel value, for example, a method of Image-Based Visual Hull can beused. Since the pixel value is calculated for each pixel according tothis method, as the number of pixels where the pixel value is to becalculated is lower, that is, as the resolution of the virtual viewpointimage is lower, the processing time is shortened.

In S4070, the output unit 130 outputs the virtual viewpoint imagegenerated by the image generation unit 120 in S5050 to the displayapparatus 3 or the display apparatus 4.

In S4080, the image generation unit 120 determines whether or not theprocessing for generating the virtual viewpoint image having the higherimage quality than the virtual viewpoint image generated in S5050 isperformed. For example, in a case where the virtual viewpoint imagegenerated in S5050 is the image for the operator to specify the virtualviewpoint and the live image is to be further generated, the flowreturns to S5030, and the virtual viewpoint image having the increasedresolution is generated. In addition, after the live image is generatedand the non-live image is to be further generated, the virtual viewpointimage as the non-live image in which the resolution is further increasedis generated. That is, since the virtual viewpoint image as the liveimage has the higher resolution than the virtual viewpoint image as thespecification image, the live image has the higher image quality thanthe specification image. Similarly, since the virtual viewpoint image asthe non-live image has the higher resolution than the virtual viewpointimage as the live image, the non-live image has the higher image qualitythan the live image.

It should be noted that, in S4080, in a case where it is determined thatthe virtual viewpoint image having the higher image quality than thealready generated virtual viewpoint image does not need to be generatedor a case where it is determined that the virtual viewpoint image havingthe higher image quality cannot be generated, the processing is ended.

By the above-described processing, the image processing apparatus 1 cangenerate the plurality of virtual viewpoint images in which theresolution is improved stepwise at respectively appropriate timings tobe output. For example, it is possible to generate the specificationimage with little delay by setting the resolution of the virtualviewpoint image such a resolution that the generation processing can becompleted in the set processing time. In addition, in a case where thelive image and the non-live image are to be generated, it is possible togenerate the higher image quality images when the generation processingis performed by increasing the resolution.

As described above, the image processing apparatus 1 performs the imageprocessing for improving the image quality of the virtual viewpointimage to generate the image having the high image quality (for example,the non-live image). The image processing apparatus 1 also generates theimage having the low image quality (for example, the live image) by theprocessing executed in the processing time shorter than or equal to thepredetermined threshold corresponding to partial processing included inthe image processing. According to this, both the virtual viewpointimage to be displayed with the delay shorter than or equal to thepredetermined time and the virtual viewpoint image having the high imagequality can be generated and displayed.

It should be noted that, in the explanation of FIG. 6, it is assumedthat the generation parameter (resolution) for completing the generationprocessing in the processing time shorter than or equal to thepredetermined threshold is estimated, and the virtual viewpoint image isgenerated by the estimated generation parameter. It should be notedhowever that the configuration is not limited to this, and the imageprocessing apparatus 1 may improve the image quality of the virtualviewpoint image stepwise and output the already generated virtualviewpoint image at a time point when the processing time reaches apredetermined threshold. For example, in a case where the virtualviewpoint image in which the resolution is 1/8 of the 4K resolution isalready generated and the virtual viewpoint image in which theresolution is 1/4 of the 4K resolution is not completed at the timepoint when the processing time reaches the predetermined threshold, thevirtual viewpoint image in which the resolution is 1/8 may be output. Inaddition, the virtual viewpoint image in which the processing forimproving the resolution from the 1/8 resolution to the 1/4 resolutionis performed in midcourse may be output.

According to the present embodiment, the case has been mainly describedwhere the image generation unit 120 included in the image processingapparatus 1 controls the generation of the virtual viewpoint image onthe basis of the image obtained by the camera information obtaining unit100 and the virtual viewpoint information obtained by the viewpointobtaining unit 110 and generates the plurality of virtual viewpointimages having the different image qualities. It should be noted howeverthat the configuration is not limited to this, and the function forcontrolling the generation of the virtual viewpoint image and thefunction for actually generating the virtual viewpoint image may beincluded in respectively different apparatuses.

For example, a generation apparatus (not illustrated) that has afunction of the image generation unit 120 and generates the virtualviewpoint image may also exist in the image processing system 10. Then,the image processing apparatus 1 may control the generation of thevirtual viewpoint image by the generation apparatus on the basis of theimage obtained by the camera information obtaining unit 100 and theinformation obtained by the viewpoint obtaining unit 110. Specifically,the image processing apparatus 1 transmits the captured images and thevirtual viewpoint information to the generation apparatus and performsthe instruction for controlling the generation of the virtual viewpointimage. The generation apparatus then generates a first virtual viewpointimage and a second virtual viewpoint image that is to be displayed at atiming earlier than the display of the first virtual viewpoint image,the second virtual viewpoint image having the lower image quality thanthe first virtual viewpoint image, on the basis of the received capturedimages and the virtual viewpoint information. Herein, the first virtualviewpoint image is, for example, the non-live image, and the secondvirtual viewpoint image is, for example, the live image. It should benoted however that the use purpose for the first virtual viewpoint imageand the second virtual viewpoint image is not limited to this. It shouldbe noted that the image processing apparatus 1 may perform the controlsuch that the first virtual viewpoint image and the second virtualviewpoint image are generated by the respectively different generationapparatuses. In addition, the image processing apparatus 1 may performoutput control for controlling the output destination of the virtualviewpoint image by the generation apparatus and the output timing andthe like.

In addition, the generation apparatus may include the functions of theviewpoint obtaining unit 110 and the image generation unit 120, and theimage processing apparatus 1 may control the generation of the virtualviewpoint image by the generation apparatus on the basis of the imagesobtained by the camera information obtaining unit 100. Herein, theimages obtained by the camera information obtaining unit 100 are imagesbased on the capturing such as the captured images captured by thecamera group 2 and the images generated on the basis of basis of thedifference between the plurality of captured images. In addition, thegeneration apparatus may include the functions of the camera informationobtaining unit 100 and the image generation unit 120, and the imageprocessing apparatus 1 may control the generation of the virtualviewpoint image by the generation apparatus on the basis of the imagesobtained by the viewpoint obtaining unit 110. Herein, the imagesobtained by the viewpoint obtaining unit 110 are the information inaccordance with the specification of the virtual viewpoint such as theinformation indicating the contents determined in accordance with thevirtual view point such as the shape or the orientation of the subjectin the virtual viewpoint image and the virtual viewpoint information.That is, the image processing apparatus 1 may obtain the informationrelated to the generation of the virtual viewpoint image including atleast one of the images based on the capturing and the information inaccordance with the specification of the virtual viewpoint and controlthe generation of the virtual viewpoint image on the basis of theobtained information.

In addition, for example, the generation apparatus that exists in theimage processing system 10 may include the functions of the camerainformation obtaining unit 100, the viewpoint obtaining unit 110, andthe image generation unit 120, and the image processing apparatus 1 maycontrol the generation of the virtual viewpoint image by the generationapparatus on the basis of on the basis of the information related to thegeneration of the virtual viewpoint image. The information related tothe generation of the virtual viewpoint image in this case includes, forexample, at least any one of parameters with regard to the image qualityof the first virtual viewpoint image and parameters with regard to theimage quality of the second virtual viewpoint image which are generatedby the generation apparatus. Specific examples of the parameters withregard to the image quality include the number of cameras correspondingto the captured images used for the generation of the virtual viewpointimage, the resolution of the virtual viewpoint image, a permissible timeas the processing time related to the generation of the virtualviewpoint image, and the like. The image processing apparatus 1 obtainsthese parameters with regard to the image quality on the basis of on thebasis of the input by the operator, for example, and controls thegeneration apparatus on the basis of the obtained parameters bytransmitting transmitting the parameters to the generation apparatus orthe like. According to this, the operator can generate the plurality ofvirtual viewpoint images having the mutually different desired imagequalities.

As described above, the image processing apparatus 1 accepts thegeneration instruction of the virtual viewpoint image based on theimages based on the capturing of the subject from the respectivelydifferent directions by the plurality of cameras and the information inaccordance with the specification of the virtual viewpoint. The imageprocessing apparatus 1 then performs the control in accordance with theacceptance of the generation instruction such that the first virtualviewpoint image to be output to a first display apparatus and the secondvirtual viewpoint image to be output to a second display apparatus aregenerated on the basis of the images based on the capturing and theinformation in accordance with the specification of the virtualviewpoint. Herein, the second virtual viewpoint image is a virtualviewpoint image having the higher image quality than the first virtualviewpoint image. According to this, for example, also in a case whereboth the user who desires to observe the virtual viewpoint image in realtime and the user who prioritizes the high image quality of the virtualviewpoint image over the real-time property exist, it is possible togenerate the virtual viewpoint image suitable to the timing when thedisplay is to be performed.

It should be noted that, according to the present embodiment, the casehas been described where the color gradation, the resolution, and thenumber of cameras corresponding to the captured images used for thegeneration of the virtual viewpoint image are controlled as the imagequality of the virtual viewpoint image, but other parameters may becontrolled as the image quality. In addition, a plurality of parameterswith regard to the image quality may be controlled at the same time.

The present invention can also be realized by processing in which aprogram that realizes one or more functions of the above-describedembodiments is supplied to a system or an apparatus via a network or astorage medium, and one or more processors in a computer of the systemor the apparatus reads out and executes the program. In addition, thepresent invention can be realized by a circuit (for example, an ASIC orthe like) that realizes one or more functions. In addition, the programmay be recorded in a computer-readable recording medium to be provided.

According to the present invention, it is possible to generate thevirtual viewpoint image in accordance with the plurality of differentrequirements with regard to the image quality.

While the present invention has been described with reference toexemplary embodiments, it is to be understood that the invention is notlimited to the disclosed exemplary embodiments. The scope of thefollowing claims is to be accorded the broadest interpretation so as toencompass all such modifications and equivalent structures andfunctions.

1. An image processing system comprising: one or more hardwareprocessors; and one or more memories which store instructions executableby the one or more hardware processors to cause the image processingsystem to perform at least: obtaining a plurality of images capturedfrom respectively different positions by a plurality of image capturingapparatuses; accepting an input corresponding to an operation forspecifying a virtual viewpoint; and generating a plurality of virtualviewpoint images based on the obtained plurality of images and theaccepted input, the plurality of virtual viewpoint images including afirst virtual viewpoint image and a second virtual viewpoint image,wherein a number of image capturing apparatuses corresponding to animage to be used for generating the second virtual viewpoint image islarger than a number of image capturing apparatuses corresponding to animage to be used for generating the first virtual viewpoint image, andwherein a plurality of image capturing apparatuses corresponding toimages to be used for generating the second virtual viewpoint imagecomprises a plurality of image capturing apparatuses corresponding toimages to be used for generating the first virtual viewpoint image. 2.The image processing system according to claim 1, wherein the secondvirtual viewpoint image has a larger image data size per frame of amoving image than that of the first virtual viewpoint image.
 3. Theimage processing system according to claim 1, wherein the first virtualviewpoint image is a virtual viewpoint image to be broadcast live, andwherein the second virtual viewpoint image is a virtual viewpoint imageto be broadcast after recording.
 4. The image processing systemaccording to claim 1, wherein the second virtual viewpoint image isgenerated based on a three dimensional model of an object which isgenerated based on the obtained plurality of images and whose accuracyis higher than that of a three dimensional model of the object used forgenerating the first virtual viewpoint image.
 5. The image processingsystem according to claim 1, wherein the second virtual viewpoint imageis an image to be displayed for a viewer to see an image generated inresponse to the operation.
 6. The image processing system according toclaim 1, wherein the instructions further cause the image processingsystem to perform outputting the first virtual viewpoint image and thesecond virtual viewpoint image, wherein timing when the first virtualviewpoint image is output is earlier than timing when the second virtualviewpoint image is output.
 7. The image processing system according toclaim 1, wherein the instructions further cause the image processingsystem to perform controlling output of the first virtual viewpointimage and the second virtual viewpoint image so that the first virtualviewpoint image is displayed in a first display area and the secondvirtual viewpoint image is displayed in a second display area differentfrom the first display area.
 8. The image processing system according toclaim 1, wherein the second virtual viewpoint image is generated byperforming image processing using at least one of image data to begenerated in a process of generating the first virtual viewpoint imagefrom the obtained image and the first virtual viewpoint image.
 9. Theimage processing system according to claim 1, wherein the second virtualviewpoint image is generated by performing image processing ofincreasing an image data size per frame of a moving image on anothervirtual viewpoint image generated based on the plurality of images andthe input, and the first virtual viewpoint image is generated byperforming processing that is a part of processing of generating thesecond virtual viewpoint image from said another virtual viewpoint imageand that is executed in a processing time smaller than or equal to apredetermined value.
 10. The image processing system according to claim1, wherein the first virtual viewpoint image is an image indicating ashape of an object to be captured by at least one of the plurality ofimage capturing apparatuses, and wherein the second virtual viewpointimage is an image indicating a color of the object, the color notappearing in the first virtual viewpoint image, in addition to the shapeof the object.
 11. The image processing system according to claim 1,wherein a number of gradations of colors included in the second virtualviewpoint image is larger than a number of gradations of colors includedin the first virtual viewpoint image.
 12. The image processing systemaccording to claim 1, wherein a resolution of the second virtualviewpoint image is higher than a resolution of the first virtualviewpoint image.
 13. The image processing system according to claim 1,wherein the instructions further cause the image processing system toperform outputting a parameter related to image quality of a virtualviewpoint image to the generation means, wherein the generation meansgenerates the first virtual viewpoint image and the second virtualviewpoint image based on the parameter related to image quality to beoutput from the output means.
 14. The image processing system accordingto claim 13, wherein the parameter related to image quality includes atleast one of a number of image capturing apparatuses corresponding to acaptured image to be used to generate a virtual viewpoint image, aresolution of a virtual viewpoint image, and a permissible time as aprocessing time for generating a virtual viewpoint image.
 15. The imageprocessing system according to claim 1, wherein a processing amount ofimage processing for generating the second virtual viewpoint image islarger than a processing amount of image processing for generating thefirst virtual viewpoint image.
 16. An image processing methodcomprising: obtaining a plurality of images captured from respectivelydifferent positions by a plurality of image capturing apparatuses;accepting an input corresponding to an operation for specifying avirtual viewpoint; and generating a plurality of virtual viewpointimages based on the obtained plurality of images and the accepted input,the plurality of virtual viewpoint images including a first virtualviewpoint image and a second virtual viewpoint image, wherein a numberof image capturing apparatuses corresponding to an image to be used forgenerating the second virtual viewpoint image is larger than a number ofimage capturing apparatuses corresponding to an image to be used forgenerating the first virtual viewpoint image, and wherein a plurality ofimage capturing apparatuses corresponding to images to be used forgenerating the second virtual viewpoint image comprises a plurality ofimage capturing apparatuses corresponding to images to be used forgenerating the first virtual viewpoint image.
 17. The image processingmethod according to claim 16, wherein the second virtual viewpoint imagehas a larger image data size per frame of a moving image than that ofthe first virtual viewpoint image.
 18. The image processing methodaccording to claim 16, wherein the first virtual viewpoint image is avirtual viewpoint image to be broadcast live, and wherein the secondvirtual viewpoint image is a virtual viewpoint image to be broadcastafter recording.
 19. The image processing method according to claim 16,wherein the first virtual viewpoint image is an image indicating a shapeof an object to be captured by at least one of the plurality of imagecapturing apparatuses, and wherein the second virtual viewpoint image isan image indicating a color of the object, the color not appearing inthe first virtual viewpoint image, in addition to the shape of theobject.
 20. A non-transitory storage medium that stores a program forcausing a computer to execute an image processing method, the imageprocessing method comprising: obtaining a plurality of images capturedfrom respectively different positions by a plurality of image capturingapparatuses; accepting an input corresponding to an operation forspecifying a virtual viewpoint; and generating a plurality of virtualviewpoint images based on the obtained plurality of images and theaccepted input, the plurality of virtual viewpoint images including afirst virtual viewpoint image and a second virtual viewpoint image,wherein a number of image capturing apparatuses corresponding to animage to be used for generating the second virtual viewpoint image islarger than a number of image capturing apparatuses corresponding to animage to be used for generating the first virtual viewpoint image, andwherein a plurality of image capturing apparatuses corresponding toimages to be used for generating the second virtual viewpoint imagecomprises a plurality of image capturing apparatuses corresponding toimages to be used for generating the first virtual viewpoint image.